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Abstract 

In human cells, estrogenic signals induce cyclical association and 
dissociation of specific proteins with the DNA in order to activate 
transcription of estrogen-responsive genes. These oscillations can be 
modeled by assuming a large number of sequential reactions repre- 
sented by linear kinetics with random kinetic rates. Application of 
the model to experimental data predicts robust binding sequences in 
which proteins associate with the DNA at several different phases of 
the oscillation. Our methods circumvent the need to derive detailed 
kinetic graphs, and are applicable to other oscillatory biological pro- 
cesses involving a large number of sequential steps. 

The central dogma of molecular biology states that for a given gene, the 
sequence of nucleotide bases in DNA is transcribed into messenger RNA 
which in turn is translated into a specific sequence of amino acids that con- 
stitute the protein coded by the initial gene. In higher organisms, such as 
ourselves, transcriptional control is a crucial step in the regulation of gene ex- 
pression. This control is modulated by the configuration of proteins around 
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promoters (DNA regions in the proximity of genes that carry out the in- 
tegration of transcriptional signals). Although there are a large number of 
theoretical models of transcriptional control networks pQ and transcriptional 
control of a single gene in prokaryotes [2J, theoretical analysis of transcrip- 
tional control of a single gene in eukaryotes is much less developed [Sj. 

Histones and other protein molecules associate with the DNA to form 
chromatin, the constituent of the chromosomes of eukaryotes. In order for 
transcription to occur, chromatin must be unfolded from its condensed ge- 
ometry in which DNA is compactly wrapped around the histones. Although 
full details are still not well understood, it is clear that sequential chemical 
reactions between the histone molecules and specialized enzymes underlie the 
modification of the chromatin structure [3]. For example, acetylation of the 
histones leads to a more open chromatin configuration, by changing the local 
electrostatic equilibrium of the molecular ensemble around where the mod- 
ification is made, enabling transcription [Sj. On each histone protein there 
are a number of different amino acids sites at which chemical reactions can 
occur leading to a modification of the histone-DNA geometry. The histone 
code hypothesis posits that the modifications of the histones provide a code 
which governs the subsequent chemical processes leading to the remodeling 
of the chromatin [Hj. 

Recent experimental studies have demonstrated a cyclic ordered sequence 
of reactions and alterations of local chromatin structure in human breast can- 
cer cells grown in tissue culture [7]. The culture of approximately 2 x 10 6 
cells is initially synchronized. The addition of a hormone, estradiol, induces 
40 min oscillations of the transcriptional activation of the pS2 gene, which 
is a marker gene for estrogenic response. Due to loss of synchronization be- 
tween the cells, the observed oscillations slowly damp and reach constant 
levels after 8 hours jSj. A possible source of desynchronization, in addition 
to stochastic fluctuations at the level of the promoter, could be the variabil- 
ity among the cells, such as ATP levels or cell size. These oscillations are 
monitored by measuring the temporal association of specific proteins with 
the DNA measured at time intervals of as short as 5 minutes over a 3 hour 
period In Fig. Q we show the association profiles of four key proteins in- 
volved in the transcriptional activation of gene pS2. Estrogen receptor (ER) 
binds estradiol and initiates the transcription process. RNA Polymerase II, 
(Pol II) is a protein complex responsible for the transcription of genes. TRIP1 
and HDAC are two different proteins that are involved in the clearance of 
the promoter after each transcription cycle. In view of the complexity of the 
sequence, and the tiny numbers of molecules involved in the binding in each 
cell, it is currently impossible to derive detailed kinetic data about the rate 
constants of the individual reactions. Moreover, since it is not clear whether 
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Figure 1: (color online). Dynamics of proteins binding at the pS2 promoter 
following administration of estradiol in MCF-7 human breast cancer cells. 
The proportions of bound pS2 promoters with key transcription factors and 
cofactors are shown as a function of time. Based on data from Ref. 
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or not the cells are coupled to each other, the mechanism of the synchroniza- 
tion of the oscillation poses a challenge for theoretical interpretation. In this 
Letter, we propose a simple model for the oscillation based on a large num- 
ber of sequential chemical reactions and transformations of the chromatin. 
Based on the analysis of the model, we are able to predict specific timings of 
the association of the protein complexes with the chromatin that reproduces 
the observed dynamics in Fig. ^ 

We assume that there is a network of proteins interacting together and, 
sequentially, with the chromatin. Each reaction induces a modification of 
the substrate complex that in turn enables the next step in the sequence so 
that the reactions are assumed to be irreversible. Further, we assume that 
the various transcription factors and cofactors involved in the reactions are 
present in sufficiently high concentration that the reaction rates a, are con- 
stant over time. This model is schematized in Fig. El A key parameter in 
our model is the number m of sequential steps in the cycle. Many transcrip- 
tion complexes contain more than 50 proteins, which may be partially or 
completely assembled on the promoter ^Oj- Of the order of 100 histone (or 
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Figure 2: (color online). A schematic diagram of the model by sequential re- 
cruitment of protein molecules to the chromatin. x\ represents the chromatin 
at the pS2 promoter. The Xi, 2 < i < m represent the protein complexes 
that form successively on the promoter, at rate a,, leading to the activation 
of transcription. 
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other proteins) modifications have been identified during transcriptional ac- 
tivation [11 j . These sequential histone modifications are associated with the 
histone code for transcriptional activation. Based on these considerations, 
we estimate that m is at least 200. From the data in Fig. ^ the period of the 
oscillation T is about 40 minutes. If all reaction rates are assumed identical 
(i.e., Oj = a), and choosing m = 200, then we have a = m/T = 5min _1 . 

Since, generally, there are only two copies of each gene in each cell, we 
first consider the dynamics using a stochastic model in which the probability 
of a given reaction per unit time is equal to the product of the rate constant 
for that reaction and the number of potential reactants present. The time 
steps between reactions obey a Markov process. The results of carrying out 
the simulation using the Gillespie algorithm for 1, 10, or 100 cells are 
shown in Fig. EK-C. 

The chemical scheme presented in Fig.|2]can be interpreted as a sequential 
Poisson process in which the duration t before the next reaction takes place 
follows the distribution p(t) = ae~ at . Then for any individual cell after 
synchronization, the fc-th cycle has a mean starting time of — with a variance 
-Sr. Taking a = m/To, the starting time of the fc-th cycle has a variance 

kT 2 

of — Q -. This suggests the natural desynchronization of the system as the 
variance increases linearly with k. As m — * oo, the variance vanishes and the 
system behaves as a delay differential system as discussed in Ref . • Thus, 
just as in the data in Fig. [TJ the stochastic system displays oscillations for 
several cycles provided m is large enough. 
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Figure 3: (color online). A-C: Stochastic simulations for 1, 10, or 100 cells. 
D: Recruitment curves calculated from Eq. (JSJ). These computations are 
based on the binding sequences given in Fig. 0] (see below). We assume 
a, = 5min _1 and m = 200. The color (or grayscale) code is the same as in 
Fig.ffl 
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In the limit of a large number of cells, the stochastic dynamics can be 
approximated by the linear differential equations 

Xx(t) = a m x m (t) - aiXi(t) , 

Xiif) = ai-xXi-^t) - diXiit) , 2 < i < m. 

In the the eigenvalues of the Jacobian matrix are Xk = 

a (e~^~ — 1), 1 < k < m. We can rewrite as + ifik, where 

ak = a (cos#fc — 1), Pk — a sin^, 1 < k < m . (1) 

Here 6k = 2kir/m. Since the real parts of the eigenvalues are non-positive, 
Eqs. (JH)-([IJ) do not show sustained oscillations [T3j . 
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Assuming initial conditions, for the [xi\, of [1, 0, 0, . . . , 0], we can compute 
the solution for all variables: 



1 

Xi{t) = — 
m 



m/2-l 

l + (_l)V 2o * + 2 J2 e a ^cos[p k t~(t-l)9 k ] 

k=l 



(2) 



in the case when m is even (a similar expression holds when m is odd). The 
higher frequency terms decrease rapidly so that for m = 200, only the first 6 
or 7 terms give a significant contribution after the first period. The leading 
term e Ql * cos f3\t sets the period. From a Taylor expansion of this result, we 
find that the envelope of the leading term decays as e~ 2n t / mT ° ; where we set 
a = itl/Tq. This result is consistent with the finding that the oscillations are 
more persistent as the number of steps of the reaction increases. 

We now consider the effect of relaxing several of the unrealistic assump- 
tions in the model. If all reactions are reversible, with all forward rate coef- 
ficients equal to a and all backward reaction coefficients equal to b, the real 
and imaginary parts of the eigenvalues are a k = (a + &)(cos — — 1) and 
Ac = (a — b) sin^ 1 . This leads to an increased damping, and an increase 
in the oscillatory period that scales as b/a for small b in comparison to a. 
Consequently, the main results presented below also hold if the reactions are 
reversible. A more general discussion on the effects of reversible reactions in 
chains of linear reaction kinetics is given in Ref . JH] • 

Since in the biological system, the reaction rates are not identical, we now 
assume that the forward rates are distributed randomly with probability dis- 
tribution Q(di). Realizing that the waiting time for each reaction to occur 
is independent and identically distributed, the k-th cycle's starting time has 
mean and variance, km J ^^-dx and km J Q^r-dx, respectively. If, in partic- 
ular, Q(di) is the uniform distribution over the interval [a(l — d),a(l + d)}, 
with < d < 1, the k-th cycle's starting time has mean: 

km /•°( 1 +<0 dx km , 1 + d 

ln ~\ 1 > ( 3 ) 



(4) 



lad Ja(i-d) x 2ad 1 — d 

and variance: 

km r a ( l + d ) dx km 



2adJa(i-d) x 2 a 2 (l — d 2 ) 
Then the mean and variance of the k-th period are, respectively, (T k ) = 
(T) — l n jzi (independent of k) and a\ k = j^7j~j^j • Thus, on the curve 

(T) = T , we have the variance a\ k = ^^j^jrnj^p ; which, again, vanishes 
as m — > oo. 

The damping of the solutions of Eqs. (JT|)-(JH) is controlled by the least 
negative a k s. Let = a(l + euii) and e = cr a /a, where o a is the standard 
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deviation of the a^s. Assuming that is uniformly distributed in the interval 
[a(l — d),a(l + d)}, we have (a*) = a and cr = ad/\^3, and then, (a>i) = 
and {uif) = 1. Solving the characteristic equation for successive orders in e, 
we find that, to fourth order in e, (a>k) = a(cos9k — 1)(1 + o(e 4 )) , which is 
consistent with numerical results that show negligible dependence of (atf.) on 
d. However, the variance of the a^s is a^ k = + o(e 4 ) . This shows 

that the properties of low damping and synchronization of the oscillation, 
observed when the a^'s are identical, are conserved in the limit of large m 
when the rate constants are different. 

We now wish to fit the model to the experimentally observed binding 
profiles of the proteins in Fig. ^ Each protein is a component of several 
different Xi(t) complexes, but we do not know a priori which ones. We call 
Pj(t) the percentage of pS2 promoters bound to one or more molecules of 
protein j. Then we have 

m 

Pj(t) = E c m^(*)> ( 5 ) 

i=l 

where Cy is either or 1. Pj(t) is the quantity measured in ChIP experiments 
shown in Fig. ^ For each protein j, the binding sequence {cy} is determined 
by doing a least squares minimization of the data to the model. Because 
the first cycle is produced by a different sequence of chemical steps than 
the subsequent ones [7j, we consider only the time points such that T < 
t < 3T . The minimization procedure is done in 2 steps: (i) we apply the 
Nelder-Mead method to minimize the quadratic error, with the constraint 
that < Cjj < 1 [inj; (ii) we use the values of cy obtained in the first step 
as initial conditions to a method that uses Lagrange multipliers, minimizing 
again the mean square error. The latter step enables us to generate binary 
Cjj's. The result of that procedure is shown in the Fig. where the colored 
regions indicate the regions where cy = 1. The Pj(t), for each protein in 
Fig.d are plotted in Fig.Et). To test the robustness of these fitted sequences, 
we have carried out a number of numerical studies in which we performed fits 
of the model to the data relaxing several of our assumptions. In particular, 
we have tested for values of m = 100, 200, 300 and 400; addition of ±2% 
of noise to the data points (corresponding to the error reported in Ref. [Z]); 
changes in the vertical scaling of the data (up to 1.4); and selection of random 
reaction rates (provided that the period, for the selection of {a^}, is close 
to T , and the solutions not too damped). Although there can be slight 
changes in the values of i where cy = 1, or, in some circumstances, a change 
in the number of blocks in which Cy = 1, the main pattern of the cy's in 
Fig. 0] remain unchanged. 
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Figure 4: (color online). Binding sequences for the proteins in Fig. ^ For 
each protein, the colored regions indicate the indices % of the complexes Xj of 
which the protein is a component. 




The results in Fig. 01 in which the precise patterns of association of each 
protein are obtained, represent the main predictions of the current work. Al- 
though one might have anticipated that there would be a single recruitment 
block for each protein, the recruitment patterns for proteins considered here 
may actually occur in two or more blocks. Two experimental methods can 
be used currently to determine the dynamics of transcription: ChIP assays 
and fluorescence microscopy-based assays, but these methods seem to pro- 
duce conflicting results [T7]. Our theoretical predictions must be viewed in 
perspective of these two experimental methods. ChIP assays determine the 
binding of promoters with specific proteins, but not at the level of one pro- 
moter in a single cell. In contrast, fluorescence microscopy assays determine 
the mobility of proteins around individual promoters, but does not measure 
binding of individual proteins to one promoter. Using ChIP data, our model 
enables us to predict successive rounds of protein binding and unbinding. The 
protein mobility indicated by these multiple binding events, corroborates the 
observations from fluorescence microscopy assays, reconciling the observa- 
tions from the two experimental methods. Due to lack of precision at the 
scale of the isolated promoter, the experimental verification of our findings is 
currently impossible. However, these results can be correlated with what is 
known about the biological system. For example, five or six principal com- 
plexes, with well-defined functions, are successively formed on the promoter 
of pS2 It has been conjectured that the assembling of these complexes is 
orchestrated by ER at different timings of the cycle EJ- Our finding of 
five different binding times of ER matches perfectly that conjecture. 

To summarize, we have proposed a simple model for the oscillation ob- 
served in protein association and dissociation during transcriptional activa- 
tion in human cells. We have shown that the model produces oscillations 
with minimal damping for large values of m. Further, these properties are 
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conserved when the reaction rates are selected randomly. The current work 
demonstrates that realistic network architecture models may not be needed 
in order to unravel the mechanisms of complex reaction sequences at the 
subcellular level. Our approach relies rather on the finding that synchronous 
dynamics of protein assembly emerge as a consequence of the large number 
of intermediate reactions. Our methods should be useful to other systems 
in which many sequential steps take place but the detailed kinetics are not 
known. Fitting the model to the data in Fig. [T] resulted in predicted se- 
quences at a time resolution not possible experimentally and, as such, may 
be invaluable for experimental design and for interpretation of the mecha- 
nisms underlying transcriptional activation. 
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